Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Perception can be highly dependent on stimulus context, but whether and how sensory areas encode the context remains uncertain. We used an ambiguous auditory stimulus – a tritone pair – to investigate the neural activity associated with a preceding contextual stimulus that strongly influenced the tritone pair’s perception: either as an ascending or a descending step in pitch. We recorded single-unit responses from a population of auditory cortical cells in awake ferrets listening to the tritone pairs preceded by the contextual stimulus. We find that the responses adapt locally to the contextual stimulus, consistent with human MEG recordings from the auditory cortex under the same conditions. Decoding the population responses demonstrates that cells responding to pitch-changes are able to predict well the context-sensitive percept of the tritone pairs. Conversely, decoding the individual pitch representations and taking their distance in the circular Shepard tone space predicts theoppositeof the percept. The various percepts can be readily captured and explained by a neural model of cortical activity based on populations of adapting, pitch and pitch-direction cells, aligned with the neurophysiological responses. Together, these decoding and model results suggest that contextual influences on perception may well be already encoded at the level of the primary sensory cortices, reflecting basic neural response properties commonly found in these areas.more » « lessFree, publicly-accessible full text available December 9, 2025
-
Free, publicly-accessible full text available December 1, 2025
-
Free, publicly-accessible full text available December 1, 2025
-
Free, publicly-accessible full text available December 1, 2025
-
Experiments to understand the sensorimotor neural interactions in the human cortical speech system support the existence of a bidirectional flow of interactions between the auditory and motor regions. Their key function is to enable the brain to ‘learn’ how to control the vocal tract for speech production. This idea is the impetus for the recently proposed "MirrorNet", a constrained autoencoder architecture. In this paper, the MirrorNet is applied to learn, in an unsupervised manner, the controls of a specific audio synthesizer (DIVA) to produce melodies only from their auditory spectrograms. The results demonstrate how the MirrorNet discovers the synthesizer parameters to generate the melodies that closely resemble the original and those of unseen melodies, and even determine the best set parameters to approximate renditions of complex piano melodies generated by a different synthesizer. This generalizability of the MirrorNet illustrates its potential to discover from sensory data the controls of arbitrary motor-plants.more » « less
-
Recent advancements in deep learning have led to drastic improvements in speech segregation models. Despite their success and growing applicability, few efforts have been made to analyze the underlying principles that these networks learn to perform segregation. Here we analyze the role of harmonicity on two state-of-the-art Deep Neural Networks (DNN)-based models- Conv-TasNet and DPT-Net [1],[2]. We evaluate their performance with mixtures of natural speech versus slightly manipulated inharmonic speech, where harmonics are slightly frequency jittered. We find that performance deteriorates significantly if one source is even slightly harmonically jittered, e.g., an imperceptible 3% harmonic jitter degrades performance of Conv-TasNet from 15.4 dB to 0.70 dB. Training the model on inharmonic speech does not remedy this sensitivity, instead resulting in worse performance on natural speech mixtures, making inharmonicity a powerful adversarial factor in DNN models. Furthermore, additional analyses reveal that DNN algorithms deviate markedly from biologically inspired algorithms [3] that rely primarily on timing cues and not harmonicity to segregate speech.more » « less
-
Speech recognition in noisy environments can be challenging and requires listeners to accurately segregate a target speaker from irrelevant background noise. Stochastic figure-ground (SFG) tasks in which temporally coherent inharmonic pure-tones must be identified from a background have been used to probe the non-linguistic auditory stream segregation processes important for speech-in-noise processing. However, little is known about the relationship between performance on SFG tasks and speech-in-noise tasks nor the individual differences that may modulate such relationships. In this study, 37 younger normal-hearing adults performed an SFG task with target figure chords consisting of four, six, eight, or ten temporally coherent tones amongst a background of randomly varying tones. Stimuli were designed to be spectrally and temporally flat. An increased number of temporally coherent tones resulted in higher accuracy and faster reaction times (RTs). For ten target tones, faster RTs were associated with better scores on the Quick Speech-in-Noise task. Individual differences in working memory capacity and self-reported musicianship further modulated these relationships. Overall, results demonstrate that the SFG task could serve as an assessment of auditory stream segregation accuracy and RT that is sensitive to individual differences in cognitive and auditory abilities, even among younger normal-hearing adults.more » « less
-
Little is known about how neural representations of natural sounds differ across species. For example, speech and music play a unique role in human hearing, yet it is unclear how auditory representations of speech and music differ between humans and other animals. Using functional ultrasound imaging, we measured responses in ferrets to a set of natural and spectrotemporally matched synthetic sounds previously tested in humans. Ferrets showed similar lower-level frequency and modulation tuning to that observed in humans. But while humans showed substantially larger responses to natural vs. synthetic speech and music in non-primary regions, ferret responses to natural and synthetic sounds were closely matched throughout primary and non-primary auditory cortex, even when tested with ferret vocalizations. This finding reveals that auditory representations in humans and ferrets diverge sharply at late stages of cortical processing, potentially driven by higher-order processing demands in speech and music.more » « less
An official website of the United States government
